7 research outputs found

    Skeleton-Based Human Action Recognition with Global Context-Aware Attention LSTM Networks

    Full text link
    Human action recognition in 3D skeleton sequences has attracted a lot of research attention. Recently, Long Short-Term Memory (LSTM) networks have shown promising performance in this task due to their strengths in modeling the dependencies and dynamics in sequential data. As not all skeletal joints are informative for action recognition, and the irrelevant joints often bring noise which can degrade the performance, we need to pay more attention to the informative ones. However, the original LSTM network does not have explicit attention ability. In this paper, we propose a new class of LSTM network, Global Context-Aware Attention LSTM (GCA-LSTM), for skeleton based action recognition. This network is capable of selectively focusing on the informative joints in each frame of each skeleton sequence by using a global context memory cell. To further improve the attention capability of our network, we also introduce a recurrent attention mechanism, with which the attention performance of the network can be enhanced progressively. Moreover, we propose a stepwise training scheme in order to train our network effectively. Our approach achieves state-of-the-art performance on five challenging benchmark datasets for skeleton based action recognition

    Techniques in enhancing computation and understanding of convolutional neural networks

    No full text
    Convolutional Neural Networks (CNNs) are effective in solving a large number of complex tasks. The performance of CNNs is currently equaling or even surpassing the human performance level in a wide range of real-world problems. Such high performance is achieved at the cost of high computational and storage requirements. To satisfy these computational requirements, specialized hardware such as Graphics Processing Units (GPUs) or Tensor Processing Units (TPUs) is required. Besides, CNNs are mainly used as a black-box tool, and only several attempts were made for their understanding. In this thesis, two studies are provided to address the problems of lack of understanding and high computational requirements of CNNs. The first study, introduced in Chapter 3, investigates and proposes a method for enhancing CNN computation by reducing the number of computational operations performed. We propose a new method for the computation enhancement in CNNs that substitutes Multiply and Accumulate (MAC) operations with a codebook lookup. The proposed method, Quantized-by-Lookup Network (QL-Net), combines several concepts: (i) a codebook construction, (ii) a layer-wise retraining strategy, and (iii) substitution of the MAC operations with the lookup of the convolution responses at inference time. The proposed QL-Net achieves good performance on datasets such as MNIST and CIFAR-10. The second study provides a better CNN understanding by studying the importance of each learned feature for an individual object class recognition. The experimental work in Chapter 4 extends the current understanding of the CNN filters' roles, their mutual interactions, and their relationship to classification accuracy. Additionally, the study showed that the classification accuracy of some classes from the target objects' set could be improved by removing the sub-set of filters with the least contribution to these classes.Master of Engineerin

    Study on emotion recognition bias in different regional groups

    No full text
    Abstract Human-machine communication can be substantially enhanced by the inclusion of high-quality real-time recognition of spontaneous human emotional expressions. However, successful recognition of such expressions can be negatively impacted by factors such as sudden variations of lighting, or intentional obfuscation. Reliable recognition can be more substantively impeded due to the observation that the presentation and meaning of emotional expressions can vary significantly based on the culture of the expressor and the environment within which the emotions are expressed. As an example, an emotion recognition model trained on a regionally-specific database collected from North America might fail to recognize standard emotional expressions from another region, such as East Asia. To address the problem of regional and cultural bias in emotion recognition from facial expressions, we propose a meta-model that fuses multiple emotional cues and features. The proposed approach integrates image features, action level units, micro-expressions and macro-expressions into a multi-cues emotion model (MCAM). Each of the facial attributes incorporated into the model represents a specific category: fine-grained content-independent features, facial muscle movements, short-term facial expressions and high-level facial expressions. The results of the proposed meta-classifier (MCAM) approach show that a) the successful classification of regional facial expressions is based on non-sympathetic features b) learning the emotional facial expressions of some regional groups can confound the successful recognition of emotional expressions of other regional groups unless it is done from scratch and c) the identification of certain facial cues and features of the data-sets that serve to preclude the design of the perfect unbiased classifier. As a result of these observations we posit that to learn certain regional emotional expressions, other regional expressions first have to be “forgotten”

    Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks

    No full text
    corecore